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ABSTRACT 

We conduct Bayesian model inferences from the observed if -band luminos- 
ity function of galaxies in the local Unive rse, using the se mi-analytic model 



(SAM) of galaxy formation introduced in iLu et al.l (|2011[) . The prior distri- 
butions for the 14 free parameters include a large range of possible models. 
We find that some of the free parameters, e.g. the characteristic scales for 
quenching star formation in both high-mass and low-mass halos, are already 
tightly constrained by the single data set. The posterior distribution includes 
the model parameters adopted in other SAMs. By marginalising over the 
posterior distribution, we make predictions that include the full inferential 
uncertainties for the colour-magnitude relation, the Tully-Fisher relation, the 
conditional stellar mass function of galaxies in halos of different masses, the HI 
mass function, the redshift evolution of the stellar mass function of galaxies, 
and the global star formation history. Using posterior predictive checking with 
the available observational results, we find that the model family (i) predicts a 
Tully-Fisher relation that is curved; (ii) significantly over predicts the satellite 
fraction; (iii) vastly over predicts the HI mass function; (iv) predicts high- z 
stellar mass functions that have too many low mass galaxies and too few high 
mass ones, and (v) predicts a redshift evolution of the stellar mass density 
and the star formation history that are in moderate disagreement. These re- 
sults suggest that some important processes are still missing in the current 
model family and we discuss a number of possible solutions to solve the dis- 
crepancies, such as interactions between galaxies and dark matter halos, tidal 



© 2012 RAS 



2 



stripping, the bimodal accretion of gas, preheating, and a redshift-dependent 
initial mass function. 
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1 INTRODUCTION 

During the past 20 years semi-analytic models of galaxy formation have been developed and 



widely adopted to study the sta 
matter (CDM) cosmogony (e.g. 


;istical properties of the galaxy population in the cold dark 


White & Frenk 199ll: 


Lacev & Silk 


1991: 


^auffmann et al. 


1993: 


Cole et al.ll994; Mo et al. 


1998: Somerville & Kolattll999: Cole et al. 


200ol Kane et al. 


2005; 


Croton et al. 


20061 Dutton & van den Bosch 200? 


). In a semi-analytic model (hereafter 



SAM), one adopts recipes to describe and parametrise the underlying physical ingredients, 
such as star formation and feedback. The free parameters in the models are then tuned to 
reproduce certain observational properties of the galaxy population. Since a variety of the 
physical proces ses that affect galaxy formation and evolution are still poorly understood (e.g 



Mo et al 



20101 ). one must quantitatively characterise the model constraints implied by the 
existing data sets as well as explore a wide range of models. The SAM approach provides 
a promising avenue to fulfil these tasks owing to its flexibility in implementation and its 
relatively fast speed in computation. By translating the theory of galaxy formation into a 
set of model parameters, SAMs can be used to make model inferences from observational 
data, and to make predictions for further tests of the theory. 

In conventional implementations of SAMs, model inferences and predictions are per- 
formed in two steps. One first tunes the model against a set of observational constraints 
to find a parameter set, and then uses this parameter set to make predictions for other 
observables. For example, one can choose model parameters governing the efficiency of star 
formation and stellar energy feedback so that a "Milky Way" halo contains, on average, 
the same mass in stars and cold gas as our own Galaxy, and adjust the dynamical friction 
time scale so that a "Milky Way" halo contains, on average, the right number of "Magellanic 
Cloud" - sized satellites. Such req u irements, together wi t h the Tully-Fisher relation, h ave been 



used by 



Kauffmann et al. 



(1999); 



their SAMs. Alternatively, 



Somerville fc Kolatt 



Cole et al. 



(1994 



2000); 



1999); 



De Lucia et al. 



Kang et al 



()2005j); 



(2004) to tune 



Baugh et al 



fl2005f ): 
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Bower et al. 



(120061 ) . among others, chose to tune their SAMs with the observed luminosity 
functions of galaxies. 

There are a number of problems with such implementations. First, a satisfactory fit is 
typically assessed by eye so that uncertainties in the observational data, a crucial aspect 
in model inference, is not properly taken into account. Second, the tuning is usually done 
by "hand" . One typically adjusts one parameter at a time until one obtains a satisfactory 
fit to the data. The problem is that the likelihood function is typically comp lex and it is 
very difficult to find the global optimum with hand tuning ( iLu et al.l 120111 ) . Third, this 
approach only yields a single parameter set so that model predictions are made without 
including uncertainties in the inference. Such predictio ns are questionable because of the 



fact that the model parame ters are largely degenerate (IBower et al 



2010 



Lu et al 



2011 



Neistein fc Weinmann 



20101 ) . For all the above reasons, the full potential power of SAMs 



has not been fully realised. To derive meaningful constraints from observations and to make 
reliable predictions, one needs to know the relative probability of various model parameters 
and, indeed, the probability of entire model families given some set of observational data. 
This i s best achieved by a Bayesian inference. 

In lLu et al.l (120111 ). we have developed a scheme to incorporate SAMs into the framework 
of Bayesian inference. To this end, we have constructed a general SAM that contains a num- 
ber of published SAMs as subsets. We have also shown that, aided with advanced MCMC 
techniques and parallel computation, it is now possible to build a Bayesian inference-based 
SAM to efficiently explore the high dimensional parameter space and to establish the pos- 
terior distribution of the model parameters reliably. In the Bayesian framework, one first 
chooses a set of observations as data constraints to derive the posterior probability distribu- 
tion of the model parameters. Such a posterior distribution encompasses the uncertainties of 
the model parameters and the observational constraints, allowing one to assess the theory in 
a statistically rigorous way. Furthermore, model predictions can be made from the posterior 
distribution including these uncertainties, providing an avenue to assess the power of future 
observations. Th e general schem e of the method along with some simple examples have 



been presented in 



Lu et al. 



(120111 ). Following our previous paper, we review Bayesian model 



prediction and introduce a systematic model 



Gelman et al. 



checking pro cedure called Bayesian posterior 



2004: Gilks 



19951 ). This technique exploits the full 



predictive checking (PPC, 
predictive power of SAMs. 

In this paper, we start to put the conventionally, though potentially flawed, two-step 
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procedure of first fixing the model parameters using one data set and then using them to 
predict other observations on a firm statistical footing. We apply Bayesian inference with 
PPC to the observed i^-band luminosity function of local galaxies, taking into account all 
the known uncertainties in the data, to derive realistic constraints on the model parameters 
of an extended model family. First, we use observations to constrain a SAM and explore 
the implications of these constraints on the underlying physical processes affecting galaxy 
formation and evolution. Second, we use the full posterior distribution of the model param- 
eters obtained to make predictions for a number of other observable properties of galaxies. 
These include (i) the cold gas mass function of galaxies; (ii) the Tully-Fisher relation; (iii) 
the colour-magnitude relation of galaxies; (iv) the conditional stellar mass function of galax- 
ies in halos of various masses; and (v) the redshift evolution of the stellar mass function 
of galaxies, the star formation rate density, and the cold gas mass density. These model 
predictions are compared with available observational data to check whether the current 
model family is able to accommodate the observational results. For most of the predictions, 
we apply the quantitative Bayesian model checking method introduced here to assess the 
model. For other predictions, for example the cosmic stellar mass density as a function of 
redshift, the cosmic star formation rate density as a function of redshift, and the cold gas 
mass density as a function of redshift, the error model for the comparison is uncertain. For 
these predictions, we only check the model with the available data graphically. 



The paper is organised as follows. In £j2j we describe our Bayesian approach, including 
the MCMC technique for sampling the posterior, Bayesian model prediction, and Bayesian 
posterior model checking. £J3] briefly describes our semi-analytic model and §1] presents the 
data used to constrain the model and defines the likelihood function. In §5] we show how the 
i^-band luminosity function of galaxies constrains the posterior distribution of the model pa- 
rameters. In [|6l we use the posterior obtained to make predictions for the colour-magnitude 
relation, the Tully-Fisher relation, the cold gas mass function, the conditional stellar mass 
function, and the redshift evolution of the stellar mass function, and compare them with 
observational data. Finally, we summarise and discuss our results further in §71 



Throughout the paper, we use a ACDM cosmology with f2 M ,o = 0.26, Q^ t0 = 0.74, 
Qb,o = 0.044, h = 0.71, n = 0.96 , and or = 0.80. The se values are consistent with the 



WMAP5 data (IDunklev et al. 



2009 



Komatsu et al. 



2009|). 
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2 THE BAYESIAN APPROACH 



2.1 MCMC simulations of the posterior distribution 

As detailed in 



Luetal 



(120 111 ), a variety of physical processes affecting galaxy formation are 
not yet well understood while copious observational data constrain the models. To derive 
meaningful constraints from the observations, we need to know the probability of the model 
parameters given the data. The Bayesian approach allows us to obtain this posterior distri- 
bution of the model parameters for a given set of data and to make robust predictions taking 
into account uncertainties present in the model. To sam ple the posterior probability distri- 



bution, we employ the Bayesian Inference Engine (BIE, IWeinberg 



20121 ) 



which includes 



a suite of advanced Markov- Chain Monte-Carlo (MCMC) algorithms and supports parallel 
computation. In particular, we adopt the Tempered Differential Evolution (TDE) algorithm 
to sample the posterior. The MCMC algorithm provides proposal parameter vectors for the 
SAM, and the SAM predicts the galaxy population for the given set of model parameters. 
The likelihood of the data given the model is evaluated and returned to the MCMC program. 
The MCMC algorithm accepts or rejects the proposal based on the posterior probability, and 
generates a new proposal for the SAM. To ensure that the chains have sufficiently explored 
the parameter space, we first run the MCMC at a higher temperature, namely we sample 
the more diffuse distribution function defined by 



p\9\D) oc P (9\Dy/ T p, 



(1) 



where p(9\D) is the real posterior probability, and T p is the so-called powered-up temperature 
with T p > 1. Since p'(8\D) is more diffuse (i.e. flatter) than p(9\D), the Markov chain 
can jump out of a local mode with higher probability and hence explore a larger range 
of parameter space. After the chains are converged using the hotter state, we resume the 
simulation from the current states at the fiducial temperature, i.e. T p = 1. The MCMC 
simulation again continues until convergence is achieved. The converge nce of the chains 
is monitored by the Gelman- Rubin R statistic (IGelman fc Rubin! Il992l ). and we declare 
convergence when R < 1.2. 



1 http://www.astro.umass.edu/BIE 
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2.2 The posterior predictive distribution 

Once we have the posterior distribution, we can make predictions for other observables b y 



marginalising the desired likelihood function over the posterior (e.g iGelman et al.ll2004f ). 
The predicted distribution of a new observable, y', given the data constraint y c , is 

p(y'|yc) = J P(y'\0)p(9\y c )d9 , (2) 

where 9 denotes the model parameter vector, p{9\y c ) is the posterior distribution obtained 
from the data constraint y c , and p{y'\9) is the probability distribution of the observable 
y' for a given model specified by 9 (i.e. a likelihood function). For deterministic models, 
the distribution function of predicted observations, p{y'\9) is a 5 function, 5[y' — y'{9)]. 
For probabilistic models, if the variance of the prediction y' from a given model 9 is much 
smaller than the variance from the posterior, the 5 function is also a good approximation. The 
resulting distribution function p(y'|y c ), called the posterior predictive distribution (PPD), 
encompasses all the inferential uncertainties and hence provides the confidence level of the 
predicted observable. For a complex model like the SAM considered here, the PPD can be 
obtained using MCMC samples. To do this, one first selects a sample from the converged 
posterior distribution {9}. For each of the 9 e {9} selected, the predictions of y' are obtained 
from the probability distribution, p(y'\9), and these y' are a sample of the PPD. 

2.3 The posterior predictive check 

Once the posterior predictive distribution is obtained, one can check the specific model 



fami 



2004 



using 



Gilks 



a pr ocedure called posterior predictive check (hereafter PPC, 



Gelman et al. 



19951 ) . The central idea of PPC is that the data replicated from the model 
should be distributed as the observed data. Any discrepancies then indicate that the model 
may be incorrectly specified. The PPC also applies to new observables that are not included 
in the data constraint. If the model is true, the PPD should not show a large inconsistency 
with the data of those predicted observables. 

In practise, a graphical representation of the PPD and the data distribution or its sum- 
mary quantities may be sufficient to identify discrepancies. The later option is particularly 
useful when the data set is large or a particular aspect of the data contains important in- 
formation. When graphical PPCs do not reveal the discrepancies between the models and 
data, one can perform numerical PPCs, which are a quantitative measure of the discrepan- 
cies. To perform a numerical PPC, one first needs to define a test statistic T(y, 9) designed 
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to discriminate between the model and the data. In general, the test statistic may depend 
explicitly on 9, but in our applications such a dependence is absent and so in the follow- 
ing we will omit 9 from the independent variables of T ■ In this paper, we use the tail-area 
probability (e.g. the p-value) of the test statistic T(y) to assess the lack of a fit to the data. 

To motivate a Bayesian definition of a p- value test, we first note that the classical p- value 
is defined as 

Pc = P[T(y')>T(y)\9] , (3) 

where the probability, P, is calculated over the distribution of y' with 9 fixed. In classical 
testing, 9 would correspond to the null hypothesis value. It could also be a point estimate 
such as a maximum likelihood estimate. In the Bayesian context, we can generalise the test 
statistic to allow for a dependence on the model parameters under their posterior distribu- 
tion, so that both the variance of the observations (y) and the uncertainties of the parameter 
values (9) are taken into account. Thus the Bayesian p-value is 

PB = P[T(y') > T(y)|y c ] = j J I nyl) > T(y) p{y'\9)p{9\y c )dy'd9 1 (4) 

where I q is the indication function for the condition q (I q equals 1 if q is true and otherwise). 
Note that the testing data y can be the same as the constraining data y c or some other 
data. If the predicted observables y' are incompatible with the model, then the observed 
test statistic T(y) may be a significant outlier of the distribution of the test statistic T(y') 
predicted by the model. If the posterior predictive p- value is close to or 1 (typically chosen 
to be 0.05 or 0.95), then the model is most likely inadequate. Note that this approach is 
similar to classical hypothesis testing, where a test statistic T measures the discrepancy 
between the data and the predictive simulations. 

Usually we cannot calculate the Bayesian p-value analytically, but we can do it using 
posterior simulations. Suppose that we have L samples of 9, (9\, ■ ■ -,9l), randomly drawn 
from the posterior distribution p(6\y c ). Then for each of these 9 samples, we can generate 
one sample y( from p(y'\9i). The Monte Carlo evaluation of equation (j3J) is then 
1 L 

Pb = 7E J T(y;)>T(y), (5) 
L i=i 

where y[ is the prediction of sample 9\. In other words, the fraction of samples where T(y') > 
T(y) is an estimate of Pb- Note that the test statistic T(y) needs to be chosen to effectively 
investigate the deviations of interest. This is similar to choosing a powerful test statistic 
when conducting a hypothesis test. 
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The Bayesian PPC includes all the inferential uncertainties implied by the constraining 
data and provides confidence bounds for the predicted quantities. Any significant inconsis- 
tency between the predictions and the data suggests that modifications to the model are 
required. As such, the Bayesian PPC provides a powerful method to test the admissibility 
of models given data. The method, however, also has its limitations because PPC does not 
provide a probability for rejecting a model. First, if a model family passes a PPC, it does not 
necessarily mean the model family is free of problems; it may only have passed because the 
chosen test statistic was insufficiently powerful. Therefore, the choice of the test statistics is 
crucial. Second, although a large difference between the PPD and the observational data in- 
dicates tensions between the model and the data, one cannot reject a model family, because 
an improper prior distribution can also result in a biased PPD. The only way to identify 
modes that can simultaneously explain multiple data sets is to perform the Bayesian infer- 
ence using the full data sets, and to use the posterior to conduct a Bayesian goodness-of-fit 
test. 

In this paper, we will use PPC to identify tensions between our SAM and a variety of 
existing data sets as follows. Suppose that we have drawn L samples from the posterior 
distribution and that the predicted observables are given in N bins. Denote the value of the 
prediction of the Zth parameter vector in the zth bin by y' li . We define 

7T=T(yO = E %^ , (6) 

i=l °i 

where and af are, respectively, the mean and the standard deviation obtained from the 
L posterior samples^. The histograms of 77 (I = 1, • • •, L) are then used to represent the 
probability distribution of the test statistic T predicted by the model. To compare with the 
observations, we define a similar test statistic from the observational data y{. 

T obs = r(y) = £ Wi Vi) j (?) 
i=l a i 

where y\ and of are the same as in equation ([6]). Comparing the test quantity from the 
observations with the distribution of the test quantity predicted by the model, the p-value 
then tells us the odds of having such observational data given the constrained model. 

If there are M independent observations of y m (m = 1, ■ ■ -, M), then following equation 
(J6j) one can compute the test quantity for each of the M observations: 



2 The model is specified, and it is the parameters that are being sampled. 
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N ( _ — / \2 

i-obs tY_, \ \ ^ lZ/m,i 1/ i) / Q \ 

>m = 1 VYm) - ~J2 ' l°J 



i=l °i 



The histograms given by 7^ bs (m = 1, • • -,M) then represent the probability distribution 
of 7~ obs . Data are often presented as the mean of a quantity together with error bars that 
describe the uncertainties of the measurements. In this case, one can generate M replica 
according to the error budget in the data and construct the distribution of the test quan- 
tity from these replica to take into account the observational uncertainties. The difference 
between the model and the data is then given by comparing the distribution of 71 and that 
of 7^° bs . In our following applications, we treat the observational mean as one realisation of 
the observable in question. In this case, we calculate the value of T obs from equation (J7|), 
now with y set to be the observational mean, and compare it with the distribution of 7j. 

One potential problem of a test based on a x 2 -like test quantity, like that described 
above, is that the power of the test may be diminished when the bins are strongly correlated 
and the number of bins is large. The resulting relatively large number of dependent variables 
will weaken the power of T. The power can be improved by incorporating the covariance 
and including only the independent degrees of freedom. Pri ncipal component analy sis (PCA) 



can achieve this using the following widely-used procedure (IMurtagh &: Hecklll987l ). We first 



construct a data matrix Y' from L model predictions of y[ (I = 1, ■ ■ -,L), which has N 

columns and L rows. We then zero the centre by the mean, y£, and scale the data by the 

standard deviation, <7j, of each column of the data matrix, yielding Y s . The PCA, which 

we perform using singular value decomposition (SVD), yields N unit eigenvectors, e*, and 

N corresponding eigenvalues, Aj. We construct a. N x N transformation matrix, U, by 

putting each eigenvector on each row. In the matrix, the eigenvectors are ordered so that 

the one with a larger eigenvalue is put on a upper row. The matrix of eigenvectors is a 

unitary transformation of the data Y s to a space where each dimension is uncorrelated. The 

eigenvalues describe the variance of the data in this new space. Using the transformation 

matrix, we find the transformed data transposed as X' T = UY S T . We may now write a new 

correlation-free test statistic for each row vector of the X' as: 
R ™/2 

V-Ep ( 9 ) 

i=l A i 

where R < N will be specified below. Equation (j5J) is not equivalent to equation (JBJ) although 
it does have a similar interpretation: we expect the T{ to be distributed as a multivariate 
normal distribution, appealing to the central limit theorem in the large L limit. Assume that 
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we have ordered the eigenvalues such that Aj > Xj if % < j. Recall that the eigenvector with 
the highest eigenvalue is the principal component of the data set that preserves the largest 
uncorrelated fraction of the total initial variance. In many cases, the magnitude of the Xj 
decreases quickly with increasing j. Components with Xj <C Ai carry little information. This 
allows us to truncate the summation in equation ([9]) by choosing R to be the smallest integer 
such that Y,i=R+i Ai/X^li < e R- H ere ; we choose 6r = 0.01. We find that this criterion 
preserves most of the information of the posterior prediction distribution and ignores the 
details that can not be distinguished given the level of the observational errors present. 
Once we determine the truncation component, we then define a modified transformation 
matrix Ur by setting all the elements except the first R rows of the matrix U to zero, and 
compute the transformation of the data as X /T = UrY s t . For the observational data vector, 
we follow exactly the same adjustment and transformation defined by the prediction data 
matrix, which yields the test quantity for the observations. We use the reference distribution 
of the test statistic (eq. [9]) computed from the posterior sample to calculate the p-value of 
the observations. 



3 MODEL AND MODEL PARAMETERS 

We employ the SAM developed by 



Lu et al. 



( 120111 ). in which the parameterisations for star 
formation and supernova (SN) feedback are generalised to encompas s many existing models. 



Lu et al. 



2011 



) for more d etails. 



Here we briefly describe the model and readers can refer to 
Our SAM starts with Monte Carlo derived halo merger trees ( iParkinson et al.ll2008f l using 
the current ACDM model, and includes important physical processes for galaxy formation, 
such as gas cooling, star formation, supernova (SN) feedback, galaxy mergers, and AGN 
feedback. Gas is assumed to be heated by accretion shocks and to form a hot gaseous 
halo that cools by radiative cooling. Owing to a reduced cooling rate and heating by AGN 
feedback, gas cooling is assumed to be unimportant in massive halos. We model this using a 
free parameter, Mqc, the halo mass above which radiative cooling becomes negligible. The 
cooling gas settles into the halo centre as a disk of cold gas, where stars form in regions where 
the surface density of the disk is sufficiently high. We use a free parameter, / SF , to control 
the cold gas surface density threshold for star formation; only gas above the surface density 
threshold can form stars. The star formation efficiency is assumed to be proportional to the 
total cold gas mass for star formation and inversely proportional to the dynamical time scale 
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of the disc, with a overall efficiency assumed to be a broken power law of the halo circular 
velocity, t> vir : e* = ctsF for v vir > Vsf, and = asF(tw/VsF)^ SF for v v - n < Vsf- The amplitude, 
«sf, the power index, /3 SF , as well as the pivotal circular velocity, Vsf, are all treated as free 
parameters. The SN feedback associated with star formation is assumed to reheat a fraction 
of the cold gas in the galaxy and may drive an outflow that can remove some of the baryons 
from the host halo. The fraction of the total SN energy that affects subsequent star formation 
is assumed to be cksn- The mass of the reheated cold gas is assumed to be proportional to 
the stellar mass, with the proportionality given by / r h = aRH^Vo/tw)^ 1 " 1 , where V is set 
to be 220km s" 1 , and orr and /3rh are free parameters. A fraction of the SN energy that is 
not used to reheat the gas is assumed to drive galactic winds, and this fraction is controlled 
by a free parameter ew- Finally, a fraction of the ejected baryonic mass is assumed to come 
back to the halo as hot halo gas on a dynamical time scale, and this fraction is controlled 
by a free parameter, /rj. When two or more dark matter halos merge, the central galaxy of 
the more massive halo is assumed to become the central galaxy of the new halo. The time 
over which a satellite galaxy orbits in its host halo before merging into the central galaxy 
is calculated based on the dynamical friction timescale of the secondary halo that hosts the 
satellite galaxy, and the real merging timescale is assumed to be a free parameter, /df, times 
this dynamical friction timescale. When a satellite galaxy merges into a central galaxy, a 
fraction of the total cold gas in these two merging galaxies is converted into stars through a 
starburst, and this fraction is assumed to be given by the satellite-to-central mass ratio as 
a SB(^sat/ m cen) /3sB 5 with cksb and /3sb two free parameters. A morphological transformation 
may occur depending on the mass ratio between the two merging progenitors. 



The processes included in our SAM are similar to those in other semi-analytic models, 
but our parameterisations of the physical processes are designed to cover many published 
SAMs as subsets and encompass the physically plausible ranges for these processes. The free 
parameters are summarised in Table [U The prior ranges for the parameters are listed in the 
last column of the table. The prior distributions of the parameters (some are taken to be 
logarithmic) are simply assumed to be uniform, as we have limited knowledge about them. 
In total, we have 14 free parameters including a parameter describing the incompleteness 
of the K-band luminosity function at the faint en d (see §1]). We adopt a recently updated 
stellar population synthesis model ( iBruzuall 120071 ) to convert the predicted stellar masses 
into K-band light. 
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4 DATA AND LIKELIHOOD 



In 


Lu et al. 


f 


2011) 


Bell et al. 


( 


2003b|) 



(2011) we adopted an approximate stellar mass function of galaxies based on 



( ]2003bl ). instead of the i^-band luminosity function, as the observational con- 



straint just to demonstrate the viability of our Bayesian approach to SAMs of gala xy forma- 



tion a nd to illustrate some basic facts about the approach. However, as we showed in lLu et al 
(120111 ). systematic errors associated with measurements of the stellar mass function of galax- 
ies are difficult to treat in the likelihood function because their statistical properties are not 
well understood, and an improper treatment of the systematic errors can result in biased 
inferences. In contrast, making a likelihood function in terms of the luminosity function is 
rather straightforward because the luminosity of a galaxy is a direct observable, and mea- 
suring the luminosity function is simply a counting process. When systematic errors in the 
luminosity measurement are negligible, the data in each luminosity bin is independent. As 
our goal in the present paper is to derive a reliable posterior, we choose to use the .fT-band 
luminosity function with a realistic error model as our observational constraint. 



The K-band luminosity f unction obtained by iBell et al.l ( 12003bl ) is based on the 2MASS 
Extended Source Catalogue (jJarrett et al.l l2000l ) with an incompleteness correction based 
on the SDSS. To model the error budget, one may first predict the properties of the galaxy 
population, and then simulate the process of measuring the luminosity function. Because 
the luminosity function could be incomplete for faint galaxies, corrections for observational 
selection effects should be made to the model prediction. In what follows we introduce our 
treatments for both counting errors and sample incompleteness. 



We first formulate the counting process for the binned luminosity function. When sys- 
tematic errors in the luminosity measurements are negligible, this is a Poisson process and 
we use the number counts in each luminosity bin to compute the likelihood. Unfortunately, 
the observational number counts are not available to us. Here we use an alternative ap- 
proach using the observed luminosity function to obtain the number counts. For a given 
absolute magnitude, M$, we estimate the largest luminosity distance within which a galaxy 
with such an absolute magnitude can be observed in a survey w ith an app a rent m agnitude 



Bell et al. 



fl2003bh is 13.57 



limit mn m . The apparent magnitude limit of the sample used in 
for the K-band, and the sky coverage is 414 square degrees. Using this information, we can 
estimate the maximum observational volume, V$, for a galaxy with an absolute magnitude 
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of Mj. Assuming that galaxy clustering is negligible on the scale of this maximum volume, 
we can write the number of galaxies with this absolute magnitude as 

n a (M) = Integer($,AM i \/ i ) , (10) 

where $j = $(Mj) is the luminosity function at an absolute magnitude M i; and AMj is the 
bin size. The logarithmic Poisson likelihood for a given model that predicts the luminosity 
function as is 



InL = ]T MM*) In (^AM^) - ^AM^ - V (1 + n a (M))] 



(11) 



i=0 



where the summation is over all the magnitude bins, and T is the Gamma function. 

However, when there is faint-end incompleteness, one should not expect $,V* galaxies in 
the faint-end bins. We define the completeness fraction of a magnitude bin i as the ratio 
between the number of the observed galaxies and the total number of galaxies in a volume 
limited sample, i.e. pi = $ b s ,i/$j, where $ b s ,iAMj is the observed number density in the 
ith bin, while $jAMj is the actual number density. Thus, if a model pre dicts Nj ga l axies in 
the magnitude bin, then the number of galaxies to be observed is PiNi. iBell et al.l ( 12003bl ) 
estimated the incompleteness in the if-band at the faint end using SDSS data, and found that 
for a complete sample the slope of the .fT-band luminosity function at Mk — 5 log 10 h > —21 
could be as steep as —1.33, compared to the slope of —0.93 obtained directly from the data 
( iCole et al.l l200lh . This suggests that the completeness ratio may be approximated by a 
power law in luminosity, or equivalently, an exponential function of absolute magnitude. 
Following this observational result, we assume that Pi is unity for Mk — 5 log 10 h < —21 but 
decreases toward the faint end as 



Pi 



^Q-QiN(MK,i+21-51og 10 h) 



(12) 



where am is a constant describing how fast the incompleteness changes with magnitude, 
and its value is equal to the difference in the faint-end slope between the incomplete and 
complete samples. Because the exact value of a m is uncertain, we treat is as a free parameter 
with a prior distribution based on the result of IBell et al.l ( 12003bl ). The observations suggest 
that the luminosity function at the faint end, if fitted by a power law, may take any slope 
between —0.93 and —1.33. Hence, we assume that the angle in logarithmic space between the 
power law of a complete luminosity function and the directly observed power law of L -0 - 93 
has a uniform probability distribution: arctan(aiN) is uniformly distributed between and 
arctan(— 0.93) — arctan(— 1.33) = 0.177. Thus, if the predicted faint-end slope is —1.33, then 
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the faint-end slope to be observed could be a random value anywhere between —1.33 and 
-0.93. 

Taking into account the incompleteness, the logarithmic Poisson likelihood for the lumi- 
nosity function is 

k 

InL = [n a (Mi) In (p i $ l M l V i ) - p&iMM - T(l + n a (M,))] . (13) 

i=0 

Thus, the Poisson likelihood is not only a function of the model parameter vector 9 but is 
also a function of am. We treat am as a parameter in the inference and marginalise over it 
to compute the observables using equation (j2J). 



5 CONSTRAINTS ON THE MODEL PARAMETERS 

To obtain samples from the posterior distribution, we run the Tempered Differential Evolu- 
tion MCMC algorithm with 256 chains in parallel. We choose T = 9 for the initial run and 
obtain convergence in 4500 iterations. The Markov chain broadly explores the parameter 
space in every dimension and converges to real modes. We then resume the simulation from 
the state at the 4500th iteration with T = 1 to sample the true posterior. This procedure 
accelerates the convergence at the fiducial level with T = 1. To achieve good mixing, we set 
a high maximum temperature, T max = 128 for the first powered-up level and T max = 1024 
for the fiducial level, for the tempering steps, which occur for every 21 regular Differential 
Evolution steps. We stop the simulation after 8000 iterations. Our Gelman-Rubin test finds 
that the chains converge after 3000 iterations and we identify 14 outlier chains. We include 
the chain states of the last 3500 iterations, 847,000 states in total, to summarise the poste- 
rior. The auto-correlation length is about 20, implying that there are approximately 40,000 
independent chain states. 



5.1 The posterior distribution 

Figure Q] shows the marginalised posterior distribution of the 14 parameters of t he model 



Lu et al 



family in question. The posterior preserves many of the features that we saw in 
(120111 ). which used a synthetic version of the galaxy stellar mass function as the observa- 
tional constraint. For example, the degeneracies between S SF and a§F and between orh and 
/3rh, and the strong constraint on the parameter Vsf, are clearly seen in both posterior dis- 
tributions. In addition, the power indices (3sf and /3rh are constrained to have large values 
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of about 10, again similar to the results obtained in 



Lu et al. 



not surprising, since the stellar mass function used in iLu et al 



(2011). These similarities are 



(120 111 ) is derived from the 



K-band luminosity function of galaxies. However, the posterior distrib ution for s ome o f the 



parameters obtained here differs significantly from that obtained in 



Lu et al. 



(1201 If ). For 

example, in the stellar mass function constrained posterior, the cooling cutoff mass, Mqc, 
and the coefficient for the dynamical friction timescale, /df are strongly degenerate and 
bimodal: a model could have a large cooling cutoff mass, which implies weak AGN feedback, 
if the dynamical friction timescale was long. However, using the K-band luminosity function 
constraint, models with very large Mqc are strongly disfavoured. The marginalised posterior 
only shows one dominant mode with lower Mcc and /df values. Similar changes also occur 
in some other parameters: e.g. /?sf and /3rh- These differences largely owe to the different 
error models. As described in §H the error model used in this paper is realisti c, and so the 
resulting posterior distribution is reliable, unlike those used in ILu et al.l (120111 ). 

Figure [2] shows the predicted .fT-band luminosity function at z = 0. The solid black lines 
with error bars shown in the left panel are the observational results. The blue li ne sketches 



the es timated faint end of the luminosity function corrected for incompleteness (jBeil et al. 



2003bJ). The yellow bands encompass the 95% confidence range of the predictions, while the 



red solid line is the median. Clearly, the model family considered here can accommodate 
the observed .fT-band luminosity function remarkably well. This is also demonstrated clearly 
with the posterior predictive check (PPC) described in §2.3| as is shown in the right panel of 
Figure |2]with the corresponding value, ps = 0.662, given in the panel. Note that we only use 
all the magnitude bins with Mk — 5 log 10 h > —21 to perform the PPC, as we have assumed 
that the faint end of the observed luminosity function is incomplete. 



5.2 Comparison with other semi-analytic models 



Here, we com pare our posterio r distribution with the model parameters used in other SAMs. 



As detailed in 



Lu et al. 



(120 111 ), our models of star formation and feedback encompass many 



published models as s ubsets. For example, our star 



as the Galform model (I Cole et al. 



2000 



Bower et al. 



orma tion model works in the same way 



20061 1 except that we include a variable 



cold gas surface density threshold for star formation. Our star formation model can also 



be reduced to the Munich model (jCroton et al.l 120061 1 by setting the parameter /3sf to 



Our SN feedback model is similar to the Munich model but allows more parameters to vary. 
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Inspecting our posterior distribution, one can find that s ome of the modes that we identify 



are broadly consistent with th ose found by other s tudies, 
edisc m the model proposed by 



Croton et al. 



Henriques et al. 



O2009|) found that 



(120061 ). which corresponds to « rh in our model, 



Somerville et al. 



i s req uired to be as high as about 10. The SN feedback parameters in 
( 120081 ). €sn and ojrh, which correspond to our q;rh and /3rh, were tuned to be 2 and 1.3 
These val ues are right on th e ridge of the marginalised posterior distribution of those di- 



mensions. 



Bower et al. 



(120101 ) found that the normalisation for the star formation efficiency 
is as low as about 0.003, which is also similar to our mode for «sf- The dynamical fric- 
tion time scale coefficient obtained here is in broad agreement with the merging time scales 
adopted in other SAMs. The posterior distribution shows that our /df is a few times larger 
than the corresponding coefficients in the Durham model and the Munich model. However, 
those models use the satellite galaxy's mass whereas we use its halo mass to compute the 
timescale, so their model parameters actually agree well with the posterior for the parameter 
in our model. Our prescrip tion for merger-triggered s tarbursts is very simila r to the Munich 



model ( Croton et al. 



20061 ) and the Somerville model (ISomerville et al.ll2008l ). and the model 



parameters they adopted are contained in the modes of the posterior we obtain. All these 
similarities between our model and other existing models and the consistency between the 
posterior modes we obtain and the parameter values adopted in other SAMs imply that, 
although our inference is based on a specific model family, our inference may hold for other 
SAMs. 



6 MODEL PREDICTIONS VERSUS OBSERVATIONAL DATA 

In the following we concentrate on a number of important observables and examine how 
our model predictions compare with available observations. To obtain the predictions using 
equation (j2J), we randomly select 1000 samples from posterior distribution. For presentation, 
we also randomly select 8 parameter sets from the posterior sample and plot their individual 
predictions for some observables. 



6.1 The local HI mass function 



Figure [3] shows the HI gas mass function of galaxies at z = predicted by our constrained 
model comp ared with the obser vational data of the HI gas mass function of local galaxies 



obtained by 



Zwaan et al. 



( 120051 ). To convert the cold gas mass in the model to the mass 
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of cold hydrogen, we multiply the predicted cold gas mass by a factor (3 = 0.74, the mass 
fraction of hydrogen in neu tral gas with the rest consistin g; of helium (He) and a minor 
fraction of heavier elements (jObreschkow Sz Rawlingsll2009l ). Furthermore, since part of the 
cold hydrogen gas may be in H2 instead of in HI, the contribution of H2 has to be consid- 
ered when comparing our model predictions with the observational data. Unfortunately, our 
current model does not trace the formation of H2 so we use a simple model to i nclude the 
contribution of H 2 . According to the observational results of iKeres et al.l ( 120031 ). the total 
mass density of H 2 in the local universe is about 0.4 times the that of the HI gas. Assuming 
that the H 2 mass in a galaxy is proportional to its HI mass, we obtain the HI mass by mul- 
tiplying the predicted cold hydrogen mass by 1/1.4. The model predictions shown in Figure 
[3] are, therefore, the cold gas mass functi on predicted by the model multiplied by 0.74/1.4. 



A similar conversion factor is adopted by 



Power et al. 



(120 ioh . 



The predicted cold gas mass function is higher than the observed function by a factor 
of more than 5. The turn down at low HI masses is artificial and results from the mass 
resolution limit of the halo merger trees used here, which is 4.5 x 10 9 M Q . Not surprisingly, 
our PPC indicates a significant difference between the model predictions and the data, with 
Pb = 0.000. This occurs because the total fraction of baryons in stars is small compared to 
the total amount of gas that can cool, and the feedback is not sufficient to remove the cold gas 
from the galaxies. Consequently, a large amount of the cooled gas has to remain as cold gas 
in galaxies. As discussed in $5], the posterior distribution of the parameters characterising the 
efficiency of supernova feedback is already pushed to the extreme, suggesting that the current 
model family may not be able to accommodate the observed stellar and cold gas contents 
of local galaxies simultaneously. The problem of overpredicting the cold gas m ass function 



i s not only in the model family we consider in the present paper. Recently, I Wang et al. 



(120 111 ) also showed the same problem in yarious models w ith different parameterisations 



of star formation. As first pointed out by 



Mo et al. 



(120051 ). this is a generic problem for 



current models of galaxy formation that use supernova feedback to reheat and eject gas from 
galaxies. Either supernova feedback is severely underestimated, or some other process might 
be responsible for preventing gas from being accreted by galaxy halos in the first place, such 



as preheating (IMo &: Mao 



2002 



Mo et al. 



2005 



Lu k Mo 



20071 ). In a forthcoming paper, we 



will use both the galaxy luminosity function and the HI mass function as joint observational 
constraints to study their implications for star formation and feedback. 
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6.2 The Tully-Fisher relation 

Figure H] shows the Tully-Fisher relations predicted by 8 randomly selected m odels from the 
poster ior distribu tion compared with the observationally derived data from 



(1201 lh . Following 



Dutton et al. 



Dutton et al 



(j201l[ ). we plot the ma ximum rotation velo cities (V maiX ) of 



galaxies versus their total stellar masses. As shown in 
Tully-Fisher relation follows a simple power law, 



Dutton et al. 



(2011). the observed 



log 



(■ 



2.179 + 0.259 log 



(14) 



Vkms-V ' '" ' "' °V1O 1O - 3 M 0/ 
and the la uncertainty in the zero point is about 0.005. For comparison, this power law is 
shown as the straight line in each panel. To mimic the observations, we only select disk- 
dominated central galaxies at z = that have bulge/total stellar masses smaller than 1/10. 
Unlike the stellar mass, the maximum rotation velocity is not a direct prediction of our 
model. As a simple model, we use the peak circular velocity of the host dark matter halo 
as a proxy for V mauX , ignoring any effects owing to the baryonic component. The halo peak 
cir cular velocity i s com puted according to the circular velocity — halo mass relation obtained 



by iKlypin et al.l (120111 ) from TV-body simulations: 



2.8 x 10- 



rn, 



0.316 



km/s . 



(15) 



v h-m Q ) 

where V max is the halo peak circular velocity and m v ; r is the halo virial mass. With all 
these assumptions, the predicted Tully-Fisher relation is independent of the bulge/total 
ratio adopted to select disk-dominated galaxies. 

As demonstrated in previous investigations, the amplitude of the Tully-Fisher relation 



can be reproduced roughly in the c urrent CDM mod e 
of a central galaxy is ignored (e.g. 



Choi et al. 



2006 



if halo contraction owing to the growth 



Dutton et al. 



20071 ). Our results shown 



in Figure H] are consistent with these investigations. However, unlike the observed power law 
relation, the predicted relations are concave upward. The figure illustrates that differences 
between the model predictions and the data are significant. The green asterisks show the 
averaged log Vmax in ten log M bins. We use the binned results to perform a p- value PPC 
and find ps = 0.005, which suggests a poor fit to the data. 

This predicted curved shape is a direct consequence of the halo mass-stellar mass re- 
lation found for central galaxies, which shows that the halo to stellar mass ratio is the 
lowest for halos with masses ~ 10 12 M and goes up towards both the low- and high-mass 



ends (jYang et al. 



2003 



20081 ). This curved shape is also seen in other semi-analytic mod- 
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els (e.g 



Croton et al. 


2006; 


Benson 


2012) 



20121 ). There are at least two possible explanations for 



this mismatch. First, the observational relation may be subject to selection effects that are 
not included in the model predictions. For example, many of the galaxies at the low- and 
high-mass ends might not be spirals observationally. Second, including halo-galaxy inter- 
actions may redu ce variations in the predicted Tully-Fisher relation. Indeed, as shown in 



Mo fc Mad (120001 ) . the interaction between the dark matter halo and the disk as given by 



adiabatic contraction can reduce the scatter in the Tully-Fisher relation produced by a vari- 
ation in the baryon fraction in galaxies, making the predicted Tully-Fisher relation closer to 
a power law. Unfortunately, this effect will also boost V mSuX , changing the overall amplitude 
of the predicted relation. Our results show that the boost has to be weak for the model 
predictions to match the overall Tully-Fisher amplitude. It is unclear if a consistent model 



can be found along these lines, or if galaxy halos have density profi 



Mo & Mao 



2000 



Weinberg fc Katz 



es shallower than those 



predicted by CDM models (e.g. 

with the baryonic component can m ake a halo profile shallower (e.g. 



20021). or if interactions 



El-Zant et al. 



2001 



Mo fc Mao 



Binney et al. 



2001 



20041 ). 



6.3 The colour distribution 

Figure [5] shows the distribution of galaxies in the g — r colour versus r-band magnitude 
plane derived from SDSS DR7 and the same distribution predicted by 8 individual models 
randomly selected from the posterior distribution. Clearly, the model predictions are diverse. 
Some models, such as the one shown in the middle of the upper row, can reproduce the 
bimodal distribution seen in the z ~ galaxy population. However, many other models 
do not predict any bimodality, and the predicted colours may be bluer or redder than the 
observed distribution. We select the region of the diagram enclosed by the magenta square, 
and divide the region into 25 x 25 bins. We renormalise the colour distribution in each 
magnitude bin for all the prediction samples and the observational data, and use these bins 
to perform a p- value PPC with a result that ps = 0.000. These results suggest that the model 
parameters constrained by the i^-band luminosity function alone do not provide significant 
constraints on the colour — magnitude relation. This also implies that the observed colour 
distribution can provide a constraint that is complementary to the one provided by the 
luminosity (stellar-mass) function. 
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6.4 The conditional stellar mass function 



We also make predictions for the conditional stellar mass function (CSMF) of galaxies at 
z ~ 0, $(M#|Mh), which is defined as the average number of galaxies as a function of galaxy 



stellar mass in host dark matter ha 
with the observed CSMFs given by 



os of a given mass. We c ompare our model predictions 



Yang et al. 



( 2008 



2009a|). Our goal is to check whether 



the model family, which can accommodate the observed total i^-band luminosity function, is 
also able to accommodate the observed stellar contents in halos of different masses. Following 
the presentations in Yang et al., we obtain the CSMFs for the following four halo mass ranges: 
10« _ io 12 -3 fr-iM , 10 12 - 9 - 10 13 - 2 ^Mq, 10 13 - 5 - 10 13 - 8 /i^Mg, and 10 14 ' 4 - 10 14 7 ^Mq. 
The corresponding galaxy populations for those halo masses are modelled with 500, 300, 



100 and 50 sampled halos, respective 
from the halo mass function given by 



y. For each mass range, the halo sample s are drawn 



Sheth fc Tormen 



fll999h and 



Sheth et al. 



(1200 lh . The 



results are shown in Figure [6j For the observational data, each CSMF is separated into two 
parts, the contribution of central galaxies (defined to be the most massive galaxy in a group) 
and the contribution of satellite galaxies (all other galaxies in a group except the central). 
As one can see, halos of lower masses on average contain a smaller number of satellites, 
and so the central term is more prominent in the CSMF. For the model prediction we only 
present the total CSMF for each case, but it is worth noting that our predicted CSMFs 
for the central galaxies match the observational results quite well. As one can see, however, 
the model significantly over-predicts the number of satellite galaxies in low-mass halos. It is 
worth noting that the discrepancy in satellite galaxies does not contradict with the excellent 
fit to the K-band luminosity function, because the field luminosity function is dominated by 
central galaxies at all magnitudes, while the CSMF is more sensitive to the satellite galaxy 
population. Applying the PPC described in §2.31 to the CSMFs, we obtain p B = 0.002, 
0.002, 0.002 and 0.024 for the four mass bins (from low-mass to high-mass), respectively, 
suggesting that the over-prediction is significant for the three low-mass cases and marginally 
significant for the most massiv e case. Th i s resu lt, obtained by exploring a large parameter 



space, reinforces the finding of 



Liu et al. 



(120101 ) that the current SAMs cannot match the 
observed CSMFs in some halo-mass ranges, even though they are able to reproduce the total 
stellar mass function. This discrepancy suggests that some important physics governing 
the evolution of satellite galax ies, such as tidal s tripping and/or tidal disruption, should 



be included in the model (e.g. lYang et al. 



2009b 



Liu et al. 



2010). iKang fc van den Bosch 
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(120081 ) have shown that tidal stripping can effectively reduce the fraction of r ed sa tellite 

fl2009[ ) have 



Kim et al. 



galaxies in their model to achieve a better agreement with the data, 
demonstrated that including tidal disruption and satellite-satellite mergers in their model 
can improve the match to galaxy clustering on small scales. In a forthcoming paper, we will 
use the observed CSMFs as constraints to infer implications for the evolution of satellite 
galaxies. 



6.5 Redshift evolution of cold baryonic masses 



With the advent of large and deep surveys of gal axies, the evolution o f the galaxy stellar 



mass function can now be o 



Oesch et al 



2010 



Yan et al 



jsery ed to z ~ 8 (e.g. 



Bouwens et al. 



2010 



Labbe et al. 



2010 : 



20111 ) . Here, we use our constrained model to predict the stellar 
mass functions at z = 0, 1.15, 2.5 and 4, and compare our model predictions with the 
existing data. We do not consider data at z > 4 because they are still quite uncertain. For 
each of the four redshifts, we use 10 4 halos, sampled from a mass distribution (dN/dm) oc 



m 



-1.5 



to construct merger trees rooted at that redshift, and adopt our posterior parameter 



distribution to predict the galaxy population using those merger trees. We then assign a 
weight to each predicted galaxy according to the ratio between th e halo mass function at 



that observed redshift [again estimated using the iSheth fc Tormenl (11999T ) formula] and the 
mass distribution used to sample the merger trees. The reason for sampling the halo merger 
trees in this way instead of just using the halo mass function is to guarantee that the massive 
halos are well sampled. The stellar mass function at a given redshift is then obtained through 
the weighted counts of the predicted galaxies at the redshift in question. Figure [7] shows the 
predicted stellar mass function at z — 0,1.15,2.5 and 4 compared w i th the observational 



data. The stellar mass functi on at z = is adopted from lLi fc White! (120091 ); the z = 1.15 



Perez- Gonzalez et al. 



mass function is that given by 
from 1 to 1.3; the z = 2.5 mass function is that of 



(200 81) for galaxie s in th e redshift range 



(2009 1) for galaxies in 



Marchesini et al. 



the redshift range from 2 to 3; and the z = 4 mass function is that of 



Stark et al. 



(12009h 



for ga 



axies with 3.2 < z < 4.7. All the stellar mass functions are converted to the Chabrier 
iej 2003 ) used in our model. Because our model is constrained by the K-band 



IMF ( Chabrie 



luminosity function at z — 0, it is not surprising that the predicted stellar mass function 
at z = agrees with the observations. For higher redshifts, the model predictions show a 
larger discrepancy with the observations, although the posterior predictive distributions are 
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quite broad. However, these broad distributions are misleading, because the shapes of the 
predicted stellar mass functions are systematically different from those observed, as shown 
by the predictions of 8 randomly-selected individual models, also plotted in Figure [7] at each 
redshift. To quantify the differences between the model predictions and the observational 
results, we again use the PPC described in §2.31 The results, shown in Figure El clearly 
demonstrate that the differences are significant for all three high- z cases. 

Although the test based on the Principal Components is powerful and general, it is not 
easy to see in which aspects the model prediction fails. Since the stellar mass functions are 
usually characterised by a Schechter function, it might be interesting to have a PPC based 
on the Schechter parameters. Figure [9] shows the predictive posterior distributions of the 
Schechter parameters (contours) compared to the observational values (red crosses with 1-a 
error bars). Here a is the faint-end slope of the Schechter function, and logM* and log0* are 
the logarithms of the characteristic mass and normalisation, respectively. The contours, from 
inside out, denote the 5%, 33%, 67% and 95% confidence levels. It is clear that our model 
predictions agree with the z = observations very well, but deviate from the observations for 
high- z galaxies. The characteristic stellar masses of our predicted galaxies are systematically 
lower than that of the observed galaxy population and the normalisations are systematically 
higher, suggesting that the model over-predicts the number of low-mass galaxies and under- 
pr edicts the number o f high-mass galaxies. T h ese re sults are consistent wit h those presented 



in 



Bower et al. 



(12006h . 



Kitzbichler k White! (120071 ) and 



GuoetaL 



(120111 ). However, these 



authors only showed the predictions for individual parameter sets, while ours are based on 
the entire posterior distribution of the model parameters. 

We also predict the redshift evolution of the total stellar mass density of the universe 
using merger trees rooted at z — 0. Since the mass resolution of the merger trees for all 
the halos is set at 1 x 10 9 h~ 1 M & , the progenitor halos are well sampled at high redshifts. 
At any given redshift, the stellar mass of each modelled galaxy i s weig hted according to 



Sheth fc Tormenl (119991 ) mass function at 



the mass of its descendant halo at z = and the 
z = 0. The stellar mass density is then obtained by summing up the weighted stellar masses 
of all the modelled galaxies with stellar masses larger than 10 8 M Q . Figure [10] shows the 
predicted comoving stellar mass density normalised by the critical dens ity of the Universe at 



the present time, toge ther with the observational results presented in 



Wilkins et al. 



and 



Stark et al. 



(I2008h 



( 120091 ) . To make a fair comparison, we correct the data 



points by taking into 



account the effects of using different IMFs and SPS models. The data in 



Wilkins et al. 



feoosJ) 
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assumed a Salpeter IMF (jSalpeterlll955l ). ma king the stellar mas ses about 70 % higher than 



those one would derive using a Chabrier IMF (IWilkins et al. 



20081 ). The data in lWilkins et al. 



(200 8J) were obtained from various observational measure ments using the BC03 SPS mode l 



(IBruzual fc Chariot 



20031) or the PEGASE SPS model ( IFioc fc Rocca-Volmerangel Il997f ). 



which includes less contributions from AGB stars than the CB07 model and can also over 
estimate the stellar mass by 5-25%, depending on the star formation history in the past Gyr 



(jConroy et al. 



20101 ) . In our model-data comparison shown in Figure [TUl these two effects 



are approximately included by shifting the data points downwards by a factor of 1.9. As 
one can see, the predicted stellar mass density at z = matches the observational results 
well. Again, this is not surprising, as the model is constrained by the i^-band luminosity 
function at the present time. Moreover, the predicted stellar mass density decreases with 
increasing redshift, a trend similar to that in the data. However, the model predictions are 
systematically higher than the observational data at z > 0. 

In Figure [11] we compare the model predict ions for the sta r formation rate den sity as a 



function of redshift with the data collected in 



Hopkins! ( 12004] ) . Since the data in iHopkins 



( 120041 ) are based on the assumption of a Salpeter IMF, we shift the data points downwards 



by a factor of 1.59 to account 



or differences in t he star formation rates between the Salpeter 



IMF and the Chabrier IMF ( iLeroy et al.ll2008l ). The model predictions agree with the ob- 



servational results at z = 0. This is not trivial because the star formation rate is not used as 
a constraint. It is also remarkable that the model reproduces the overall trend of the evolu- 
tion. In detail, the predicted increase of the star formation rate density with redshift below 
z = 1 appears slower than that in the observations. At higher redshifts, the predicted star 
formation rate density declines mildly with redshift, while the observations show a roughly 
constant rate over a large range of redshift. The largest discrepancy occurs in the redshift 
range between 0.5 and 3, where most of the data points lie above the model predictions. This 
suggests that the model underpredicts the star formation rate at high redshift. However, as 
we have shown above, the same model actually overpredicts the stellar mass density. Thus, 
the discrepancy between the model predictions and the data cannot be solved simply by 
changing the overall star formation or feedback efficiencies. It indicates that the data sets of 
the stellar mass density and the star formation rate density are mutually inconsistent either 
because of uncertainties in the observations or because some assumptions used to derive the 
stellar mass and star formation rate from observables may be incorrect. For example, the 
data of the stellar mass and star formation rate are all derived with the assumption that the 
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IMF is universal, while in reality the IMF may vary with redshift (IDave fc Oppenheimer 



2007 



Fardal et al. 



20071 ). Indeed, it has been demonstrated that if the IMF is more "top- 



heavy" (or "bottom-light" ) at higher redshifts, the discrepancy between the star formatio n 



rate and the stellar mass density can be alleviated (IWilkins et al. 



2008 



Kang et al.ll2010h . 



Our results demonstrate that, even if the model parameters are varied over a large range, 
the current model family (which assumes a universal IMF) may still not be able to match 
the data over the observed redshift range, suggesting that a redshift-dependent IMF might 
be necessary. 

In Figure [12J we show the predictions of the comoving cold gas mass density, normalised 
by the critical density of the Universe at z = 0, as a function of redshift. Here again we 
make corrections for the contributions of He and H 2 using the simple models described 



galaxi es (jRao fc Briggs 



1993 



Zwaan et al 



1997 



20051 ) or from empirica 



2003 af), and with high- re dshift measurements based on DLA systems (IPeroux et al 



Prochaska &: Wolfe 



models ( 


Bell 


et al. 


(Peroux et al. 


2003; 



20091 ). Once again the model significantly overpredicts the cold gas mass 



at low redshifts. In particular, the model predicts an increasing trend of the cosmic cold 
gas mass density with decreasing redshift, whereas the data show that the cold gas density 
actually decreases with time at z < 2. This suggests that any processes that reduces the 
cold gas content of galaxies must be time- dependen t, operating effec tively only at relatively 
low redshifts. The preheating model advocated by 



Mo et al. 



(120051 ) has this property, and 



we will use our Bayesian SAM to explore this possibility in an upcoming paper. 



7 DISCUSSION 

We have used a Bayesian SAM of galaxy formation to make model inferences from the 
observed i^-band luminosity function of galaxies. We found that some of the free parameters 
specifying our model family are well constrained even with this single data set, and the 
posterior distribution contains the parameters adopted in some existing SAMs. We have 
used the posterior distribution to make predictions for the colour-magnitude relation of 
galaxies, the Tully-Fisher relation, the conditional stellar mass function of galaxies in halos 
of different masses, the HI mass function, the redshift evolution of the stellar mass function 
of galaxies, and the star formation history, all with their full inference uncertainties. The 
information in the available data can be used to check the model. Comparing the model 
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predictions with available observational results we have found that the current model family, 
although covering a large model space, still has serious tensions with the observational data. 
It over-predicts the satellite fraction, and vastly over-predicts the HI mass function at z ~ 0. 
It predicts stellar mass functions that are too steep at high redshift. It predicts a redshift 
evolution of the stellar mass density and the star formation history of the Universe that 
are in conflict with the observations. It predicts Tully-Fisher relations that are not well 
described by a pure power-law relation between galaxy stellar mass and rotation velocity. 
These discrepancies suggest that the current model family may still miss some processes 
important for galaxy formation and evolution. 

Current SAMs over-predict the satellite fraction. Comparing the conditional stellar mass 
functions o f galaxi e s predicted by fo ur popular SAMs with the observational results of 



Yang et al.l f j2009ah . 



Liu et al. 



(120101 ) found that all of the SAMs over-predict the satel- 
lite fraction by a factor of two or more. Since our model family covers a large parameter 
space, our results demonstrate that this is a generic p roblem for the cu rrent model family, 
rather than just for the specific models considered by iLiu et al.l (12010I ). There are at least 
two ways to address this problem, both requiring an extension of the current model family to 
include some new processes. The first is to suppress star formation in dark matter halos at 
high redshift. However, this will further exacerbate the current underprediction in the star 
formation history at high-z (see Fig. ITT]) . The second is to introduce tidal stripping and dis- 
ruption to reduce the number of satellites. Observationally, there are indications that some 
satellite galaxies are being destroyed b y the tidal forces o: 



their hosts and/or by interactions 
with substructures in their hosts (e.g. iMihos et al.l 120051 1 . In addition, recent observations 
have revealed the existenc e of halo stars in clusters and g roups of galaxies (jZibetti et al. 



2005 



Gonzalez et al 



2005 



Krick et al. 



2006 



Zibetti 



stripped from satellite galaxies. As discussed in 



2008) 



Yang et al. 



, whic h are believed t o be stars 
fl2009bf ) and bu et all feoiOf h the 



observed satellite population can be better reproduced when one allows for a halo component 
of stars. In a forthcoming paper, we will use the observed conditional luminosity functions 
of galaxies as additional constraints, and use Bayesian evidence to examine whether a new 
model family including tidal disruption is favoured over the original model family. The re- 
sulting posterior will then be used to predict the amount of halo stars in halos of different 
masses and to check whether the model predictions are consistent with observations. 

The current model family constrained by the i^-band luminosity function va stly over- 



predicts the Hi-mass function. This problem has not been widely recognised (but see 
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20051 ). as many of the early investigations focused only on the stellar component. In the cur- 



rent ACDM model considered here, the baryon component is about 17% of the total mass 
density of the universe, while only a small fraction, « 10%, of the baryons are in stars. 
In low- ma ss halos where most of the barvonic matter is accreted through the cold- mode 



accretion (iBirnboim &: Dekel 



2003 



Keres et al 



2005 



20091) and radiative cooling is very ef- 



ficient, the fraction of the gas that has not been locked into stars must be in the cold phase, 
unless the gas is heated or ejected by some feedback process. In most SAMs, including the 
model family considered here, gas accretes into dark matter halos and cools, but then can 
be heated in and/or ejected from halos by supernova explosions associated with star forma- 
tion. However, the total energy produced is limited. As we have seen, the energy required to 
reduce star formation sufficiently is already a large fraction of the total energy produced. To 
remove most of the cold gas so that the resulting Hi-mass function matches the observed one 
requires even higher efficiencies. Even worse, numerical simulations have demonstrated that 
the efficie ncy of supernova feedback in reducing the cold gas in low-mass galaxies is actually 



very low (IMac Low fc Ferrara 



19991 ). All of this suggests that supernova feedback as imple- 



mente d in current SA Ms may not be responsible for suppressin g star formation i n low- mass 



halos flMo 



Mo et al. 



et al 



20051 ). One alternative possibility, proposed bv lMo fc Maol (120021 . 12004 ) and 



(120051 ) but not yet thoroughly investigated, is the preheating of the intergalactic 



gas, which results i n a reduced fraction of the gas that can be accreted by low-mass ha- 



los ( Lu fc Mo 



20071 ). Such preheating might be pro duced by star formation and/or AGN 



activity during early phases of rapid star formation (IMo fc Mao 



2002 



2004 ). or by shocks 



assoc iated with the formation of large-scale pancakes in the cosmic density field ( ]Mo et al 



20051 ). In a forthcoming paper, we will explore such a model family. 



The current model family also predicts stellar mass functions for high-;? galaxies that is 
different than those observed, i.e. it predicts too many low-mass galaxies and an insufficient 
number of massive galaxies. There are at least two explanations for this discrepancy. First, 
the current model family might not properly take into account the redshift dependence of 
star formation. In our model, merger-driven star bursts are distinguished from quiescent star 
formation, so that a redshift-dependence of star formation owing to the redshift dependence 
of the galaxy merger rate naturally occurs. However, it appears that this effect alone is 
insufficient. Another process that may lead to a redshift-dependent star formation rate 
is the accretion of cold gas into galaxies. Gas accretion at high-^ is dominated by cold- 



mode accretion, while hot-mode accretion dominates at low-z (IKeres et al. 



2005 



20091). Gas 
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accretion, and hence star formation, proceeds faster in galaxies with cold-mode accretion 
than in those with hot-mode accretion, so more stars would form at high z. Our current 
model family does not distinguish between cold and hot mode accretion explicitly, and it 
would certainly be interesting to explore models that do. Second, the merger time scales at 
high- z may be overestimated relative to those at low-z. If the mergers of low-mass galaxies 
occurred more frequently at high z, the number of small galaxies might be reduced, while the 
number of massive galaxies would be increased by the larger number of merger remnants. It 
is possible that the Chandrasekhar dynamical friction formula overestimates the merger time 
scales because accretion at high-z is dominated by mergers along a few filaments, making 
the galaxies embedded in them merge with the central galaxy faster. It would be interesting 
to quantify such effects with numerical simulations and to include them in a semi-analytic 
model to explore its impact on the redshift evolution of the galaxy luminosity function. 

The current model family predicts a star formation history that is lower than that ob- 
served at high z. Since our model predictions match the stellar density at the present epoch, 
simply increasing the star formation rate at high-z would over-predict the total stellar mass 
density at the present time. There is also not much room to reduce the star formation rate 
at low-2 to compensate for the increase at high-z, as the current model predicts a star for- 
mation history at \ow-z that matches the observations. A redshift-dependent stellar initial 
mass function (IMF) may address this discrepancy. If the IMF is top heavy (or bottom light) 
at high redshift because of a different star formation mode, e.g. in merger driven starbursts, 
the observed star formation history for a universal IMF would overestimate the true star 
formation rate. This would help alleviate the discrepancy between our model predictions and 
observations. However, a non-universal IMF would have other observational consequences 
and a systematic analysis of such consequences would be required to show that such an IMF 
is indeed preferred. 

Finally the current model family predicts Tully-Fisher relations that are curved, sug- 
gesting that either selection effects in the observational samples are not properly taken into 
account in the model or an interaction between the baryonic and dark matter components 
plays a crucial role in shaping the observed Tully-Fisher relation. It is important, as a next 
step, to include a detailed model for the rotation velocities, and to compare the model pre- 
dictions with an ob servational sample where the selection effects are better understood (e.g 



Pizagno et al. 



20071 ). 



Our Bayesian SAM allows us to explore the various possibilities mentioned above with 
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probabilistic rigour. The Bayesian 'goodness of fit' provided by the posterior predictive 
check helps assess the admissibility of model families, and Bayesian evidence can be used 
to discriminate between different model families using the observational data. In a series 
of forthcoming papers we will use these tools of Bayesian inference to address the various 
problems identified above. 
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Table 1. Model parameters 



# 


Parameter 


Meaning 


Prior 


1 


logMcc(M a ) 


cooling cut-off halo mass 


[1.5 , 4.5] 


2 


logCtSF 


star formation efficiency power-law amplitude 


[-3, 0] 


3 


0SF 


star formation efficiency power-law index 


[-1, 12] 


4 


logVsF (km/s) 


star formation law turn-over halo circular velocity 


[1.5, 3.0] 


5 


log/ SP (M /pc 2 ) 


star formation threshold gas surface density 


[-1- 3] 


6 


log OSN 


SN feedback energy fraction 


[-3, 1] 


7 


log OJR.H 


SN feedback reheating power-law amplitude 


[-3, 2] 


8 


/3rh 


SN feedback reheating power-law index 


[0, 14] 


9 


loge w 


fraction of surplus SN feedback energy used for powering wind 


[-3, 0] 


10 


log /ri 


fraction of re-infall ejected hot gas 


[-2, 0] 


11 


log /df 


merging time-scale in dynamical friction time-scale 


[0, 2] 


12 


log asB 


merger triggered star burst efficiency power-law amplitude 


[-2, 0] 


13 


/?SB 


merger triggered star burst efficiency power-law index 


[0, 2] 


14 


arctan(oiN) 


faint-end incompleteness 


[0, 0.177] 
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Figure 1. The 2-D and 1-D marginalised posterior probability density distributions for the 14 free parameters. 
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Figure 2. The left panel shows the Bayesian posterior predictions of the K-band luminosity function at the 
present time. The posterior is constrained by the K-band luminosity function. The black solid line with 
error bars shows the observational data. The yellow band encompasses the 95% confidence range of the 
predictions and the red line denotes the median yalne of t.Vie pred ictions The blue solid line shows the 
estimated faint end corrected for incompleteness (jBell et al.ll2003bl ). The right panel shows the posterior 
predictive distribution of the test quantity T for the K-band luminosity function. The red line marks the 
position of the observed luminosity function in the distribution. 
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Figure 3. The posterior predictions of the HI ga s mass function at t he present time compared with the HI 
gas mass function of local galaxies obtained by IZwaan et al.l J200a ). The black solid line with error bars 
denotes the observational data. The yellow bands encompasses the 95% confidence range of the predictions 
and the red lines denote their median value. The downturn at low masses is caused by resolution effects. 
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Figure 4. The stellar mass Tnlly-Fishpr relation predicted by 8 models randomly selected from the posterior 
compared with data from | r)ntt.on p.t a.l I ipm jl ) show n in the upper- left panel. The red line denotes a fit to 
the observational data given bv lDutton et all (|201l[ ). 
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Figure 5. The colour - magnitude diagram predicted by 8 models randomly selected from the posterior 
compared with observational data from SDSS (the upper-left panel). The magenta dotted line encloses the 
square region that is used to conduct a PPC. 
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Figure 6. The Bayesian posterior predictions of the conditional galaxy stellar mass functions for four halo 
masses at the present time. The halo mass ranges are noted in each panel. The black solid lines with error 
bars denote the observed CSMFs for all galaxies that reside in halos with the corresponding virial masses. 
The blue dashed lines show the CSMFs for central galaxies only and the red dotted lines show that of 
satellite galaxies. The yellow bands encompass the 95% confidence range of the predictions for the satellite 
galaxies and the red lines denote their median value. 
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Figure 7. The posterior predicted stellar mass functions at z =0, 1.15, 2.5 and 4. The yellow bands enclose 
95% confidence range and the red line plots the median. The blue dashed lines denote the predictions of 8 
models randomly selected from the posterior sample The Hata are the black solid lines with error bars Trie 



stellar mass fu nction for z = is fro 
z = 2.5 is from iMarchesini et al.l ( 200 



i>), 



Li k White '(200 



Perez-Gonzalez et al.l 



and z = 4 is from Stark et al. (200E 



ars I he 
] (|2008l) . 
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Figure 8. The posterior predicted distribution of the test quantity T for the galaxy mass functions at four 
different redshifts. The p-value for each redshift is labelled in the corresponding panel. 
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Figure 9. The posterior predictive distribution of the Schechter function parameters of the predicted stellar 
mass functions at z = 0, 1.15, 2.5 and 4. The contours enclose the 5%, 33%, 67% and 95% confidence levels. 
The red crosses denote the fitted values for the corresponding observational results. 
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Figure 10. The model predictions for the comoving stellar mass density of the universe normalised by the 
present day critical density compared with observational data. The yellow band encompasses the 95% con- 
fidence range and the red solid line shows the median value of the predictions. The points with error bars 
show various observational estimates. 
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Figure 11. The posterior predicted comoving star formation rate density of the universe. The yellow band 
encompasses the 95% confidence range, and the red solid line shows the median values of the predictions. 
The points with error bars show observational estimates. 
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Figure 12. The posterior predicted comoving cold gas mass density of the universe normalised by the critical 
density of of the universe at the present time. The yellow band encompasses the 95% confidence range of 
the model predictions while the red solid line shows the median. Points with error bars show observational 
estimates. 
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